Spotify is one of the larger music streaming services available today with 345 million active users 1. Instead of having to buy cds or download every song to listen to, Spotify allows access to millions of songs without having to download them on electronic devices.
In our project, we want to discover the most popular genre. In addition, our other question is if a feature has a strong correlation to certain other features. Our data will specify a few genres that may be the most popular. Certain features will be strongly correlated to other features.
The data we are using is based on Spotify data from 1921 to 2020 including over 175,000 audio tracks.We found our data on Kaggle 2. This dataset groups the data by artist, genre, and year. There are nine different variables measured in the dataset. They are acousticness, danceability, duration, energy, liveness, instrumentalness, loudness, speechiness, valence, popularity, and tempo.
Energy is a perceptual measure of the intensity and activity of a track on a scale from 0.0 to 1.0. Some of the perceptual features that are included in this are dynamic range, perceived loudness, timbre, onset rate, and general entropy. Liveness ranges from 0 to 1 and detects if an audience is present in a recording. If the liveness value is above 0.8, there is a strong likelihood that the track is live. Acousticness is the confidence measure of the track being acoustic. It varies from 0.0 to 1.0, with 1.0 representing high confidence that the track is acoustic. Loudness ranges from -60 to 0 and is measured in decibels (dB). It suggests the overall loudless averaged over the entire track. The measure of danceability includes a combination of tempo, rhythm stability, beat strength and regularity. It rates how suitable a track is for dancing from 0.0 to 1.0 with 1 being the most danceable. Duration measures the length of the track in milliseconds (ms). The instrumentalness feature tracks whether a song contains vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are considered vocal. Instrumentalness ranges from 0 to 1.0 with 1.0 being the most instrumental. Speechiness is the opposite of instrumentalness, measuring the relative length of the track containing any kind of human voice. The tempo feature gives information on the tempo of the track in Beat Per Minute (BPM). Valence measures the positiveness of the track, higher valence relates to more cheerful and upbeat songs. Lastly, popularity is calculated by an algorithm that is based on the total number of plays the track has had and how recent those plays are.
In the rest of our report, we intend to first group the genres into broader categories and then analyze the features throughout the genres. We want to discover which genres are the most popular by using t-tests comparing the genre popularity means. We will also compare features to each other and test the correlations between two features to see if they have a strong linear relationship or not. In the end, we hope to discover how popularity is related to different genres, as well and how different features relate to each other.
There were 3232 genres. We condensed these into the top 20 occurring terms in these genres using regular expressions and counting the occurrences.
Here are all the original genres.
This is the top 100 terms found from all the genres. These will be used to create the simplified genres.
These are the top 20 terms found. They are the ones we will use. Note this only uses 60.7% of the data since 39.3% of the data do not fall under these top 20 categories.
We use these top 20 to create a more concisely labeled dataset (along with the label other).
This graph shows the number of occurrances of each simplified genre.
This graph shows the number of occurrances of each simplified genre without the “other” category.
The second question we want to answer is to see if any features have strong linear correlations to other features. First, we found r-value combinations between all of the features, which is shown in the table below.
The raw r-values
The r-values in an easier to view format. The red and blue show signify higher correlations.
As this table shows, some features seem to have strong linear relationships, while some features seem to not have a strong linear relationship. Next, we filtered for the absolute value of r-values only over .9 to find the strongest feature relations. You can see some other thresholds below. We chose .9 as a threshold arbitrarily since there were many features that correlated. Below are different thresholds.
0.90.80.70.6noneIn the below graph and table we can again see that there are strong correlations between energy and the other features acousticness, loudness, and tempo. However, it is also interesting to note that those same three features that we found correlate strongly with energy also correlate with each other, although to a lesser degree.
It is hard to say what this means exactly, but it does suggest a few possibilities, and speak to the difference between correlation and causation. For example, there is a r-value of -0.8715355 between tempo and loudness. However, since we know that both those features correlate even stronger with energy, it may be possible that what is more significant is their relation to energy. This shows that these features are all highly related, and the fact that they all also correlate highly with each other suggests these features all measure for something similar.
Look. Boxplots.
Density plots. These are messy so in the next secion we will try to make sense of them.
We ran t-tests to find differences between genres in the different features. The t-test statistic3 is as follows:
\[ t = \frac{m_a - m_b}{\sqrt{\frac{s_a^2}{n_a}+\frac{s_b^2}{n_b}}} \]
We use this test statistic to calculate the p-value by finding the corresponding quantile from the student t distribution with \(\max(n_a, n_b)-1\) degrees of freedom. We do this test between every combination of genres for all features:
All t-test between genres for acousticness.
All t-test between genres for danceability.
All t-test between genres for duration_ms.
All t-test between genres for energy.
All t-test between genres for instrumentalness.
All t-test between genres for liveness.
All t-test between genres for loudness.
All t-test between genres for speechiness.
All t-test between genres for tempo.
All t-test between genres for valence.
All t-test between genres for popularity.
Filtered for only significant differences (p-value < 0.5).
Find most popular by finding highest mean popularity.
t-test for popularity (again)
Isolate most popular
Only look at the most popular music (rap).
Isolate genres that didn’t have p-value < 0.5 and therefore cannot be dismissed as not also as popular as rap.
Does this make sense? (Context)
Kinda. The mean popularizes seem the same.
We also discovered which features have the strongest linear correlations to each other, vs which features have no linear relationship. We found that energy has a correlation over the absolute value of 0.9 to three other features, acousticness, loudness and tempo. Acousticness has a negative correlation with energy while loudness and tempo both have positive correlations with energy. Considering that acousticness, loudness, and tempo are all measured based on set measurements, while energy is calculated from intensity and activity in the song, we can infer that acousticness, loudness, and tempo all affect the energy of a song.
A short-coming of our analysis is that we do not know how many songs are included in the data for each year. Some year’s data may be based on more songs than other years.
Future work on this dataset could involve testing out more of the features relationships and seeing if they have strong models. We could also look for datasets from other music streaming services, such as Apple Music and Pandora.